For my final project, I have chosen to build upon some analysis we started in past assignments in CEE218X. The bay area has some of the biggest inequality disparities in the country, and I think it is important to recognize that not everyone who lives here makes a six-figure tech salary. As development of this region continues and it becomes more and more homogeneous, we must be increasingly mindful of how middle and lower-income individuals and households can easily be left out of key policy decisions. I think that pollution burden, traffic, asthma rates, cardiovascular disease, and drinking water quality are just a few indicators of overall well-being, something that can vary greatly depending on what resources are available to a household or individual. It’s easy to assume in the bay area that everyone has access to all the resources they need, but in fact the bay is one of the most unequal regions in the country in many ways. In order to demonstrate this in my research, I examine where these indicators exist throughout the bay area and to what degree, and focus on which regions might be especially vulnerable by identifying indicators that are likely to occur concurrently and which regions experience multiple indicators. My topic was inspired by pod discussions during week 3, in which my pod spent some time talking about the vulnerabilities Stanford allows students to face during extreme heat events, especially students with health conditions or low/middle income students who can’t purchase fans and other cooling supplies. This leaves less socioeconomically privileged students especially at risk, while the university simply sends out emails saying to wear light clothing and close windows when it is hot out or drink warm beverages when it is cold. As a FLI student myself, I often feel the effects of policies which are intentionally best-suited towards high income individuals. I think this project could be impactful, both on campus and in the bay area ass a whole, because I may be able to identify regions which are especially vulnerable to policies that are increasingly best-suited towards class-privileged households.

Before examining regional indicators, I first created a simple map of bay area counties by income to get a general idea of where vulnerable regions might be located. I chose to create a household-level map instead of an individual one, since HINCP calls household (not individual) income.

Next, I create maps showing rates of asthma, cardiovascular disease, drinking water quality, traffic, and pollution burden. I chose factors that are outside the ‘sphere of influence’ of most individuals because I think this shows which areas experience vulnerabilities that cannot be seen as the fault of one individual or household (for instance, low education rates or unemployment in one region might be overlooked in policy making because these qualities could theoretically be solved by individuals choosing to pursue higher education or seek work).

Cardiovascular disease:

Asthma rates:

Drinking water quality:

Traffic:

Pollution burden:

Then, I create scatter plots showing correlation between some indicators which may be likely to coexist, such as asthma and cardiovascular disease prevalence, pollution burden and drinking water quality, and traffic and pollution burden. Though correlation does not equal causation, these plots are important because they demonstrate to policymakers that oftentimes regions which experience one vulnerability experience other vulnerabilities which may be related. Likewise, areas which do not experience one vulnerability may be less likely to experience others, meaning that negative impacts of policies are likely to be concentrated in small, specific regions. As shown in the plots below, there is some correlation between asthma and cardiovascular disease prevalence and traffic and pollution burden, while not much exists between pollution burden and water quality.

Finally, I perform a linear regression analysis to further examine the correlation between the three sets of indicators mentioned and plotted above. The regression analyses show that the heavist correlation between indicators examined is between asthma and cardiovascular disease, meaning that regions which have a high prevalence of one of these two indicators likely have a high prevelance of the other. The other two sets of indicators are shown to have little or no correlation after the linear regression analysis.

## 
## Call:
## lm(formula = log(Asthma) ~ Cardiovascular, data = bay_asthma_cardio_tract)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.96882 -0.26787 -0.01327  0.22609  1.41453 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    2.200969   0.030096   73.13   <2e-16 ***
## Cardiovascular 0.141410   0.002641   53.55   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4124 on 1578 degrees of freedom
## Multiple R-squared:  0.6451, Adjusted R-squared:  0.6448 
## F-statistic:  2868 on 1 and 1578 DF,  p-value: < 2.2e-16

## 
## Call:
## lm(formula = log(Water) ~ Pollution, data = bay_water_pollution_tract)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.89476 -0.24221  0.06508  0.33574  1.32597 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.561613   0.044765 124.239   <2e-16 ***
## Pollution   -0.001157   0.001209  -0.957    0.339    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4747 on 1576 degrees of freedom
## Multiple R-squared:  0.0005805,  Adjusted R-squared:  -5.369e-05 
## F-statistic: 0.9153 on 1 and 1576 DF,  p-value: 0.3388

## 
## Call:
## lm(formula = log(Pollution) ~ Traffic, data = bay_pollution_traffic_tract)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.14780 -0.16033  0.02566  0.19161  0.63868 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3.378e+00  1.222e-02  276.38   <2e-16 ***
## Traffic     1.324e-04  8.552e-06   15.48   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2787 on 1572 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.1322, Adjusted R-squared:  0.1317 
## F-statistic: 239.5 on 1 and 1572 DF,  p-value: < 2.2e-16

All in all, I believe my research demonstrates that there are regions in the bay area which may be especially vulnerable to negative impacts of policies which are best-suited towards socioeconomically privileged regions by showing that there are indicators which are heavily concentrated in some regions within the bay area, and that there are sets of indicators which are likely to exist concurrently. It is important that policymakers pay extra care towards these neighborhoods and communities, especially as the bay becomes less and less accessible to individuals and households of different backgrounds so that no one is left out in the future development of the bay.